Predictive Maintenance
logo
## Warning: `as_tibble.matrix()` requires a matrix with column names or a `.name_repair` argument. Using compatibility `.name_repair`.
## This warning is displayed once per session.
## Warning: 952 failed to parse.
RESAMPLE RATE: 1 cycle =  15 minutes
 MACHINE FAILURE DETECTED WITHIN: 24 hours = 96 cycles
 MINIMUM DATA REQUIERED: 3.25 hours = 13 cycles

1 Data Wrangling


1.1 Load data

1.1.1 SPECTO

Data is resampled every 15 minutes.

RESAMPLED MOTOR DATA: 
 [Rows - Observations]: 384740 
[Columns - Variables]: 50

1.1.2 MINE

TRANSMUTED MINE DATA 
 [Rows - Observations]: 5719 
[Columns - Variables]: 15

1.1.3 SAP data

SAP DATA: 
 [Rows - Observations]: 2002 
[Columns - Variables]: 10

1.4 Descriptive Statistics

2 Exploratory data analysis


3 Feature Engineering


3.2 Generating train and test data

[1] "FIRST DATE: 2019-09-18 10:45:00"

fail_id

fail_SS

caex

fail_code

LIFETIME

FIRST_TTF

LAST_TTF

LAG_TTF

MAX_CYCLE

MIN_CYCLE

44

electrical

53

3

661

277

0

260

661

1

81

engine

57

1

175

174

0

64

175

1

129

electrical

62

3

71

70

0

254

71

1

176

engine

63

1

106

105

0

240

106

1

200

engine

64

1

449

65

0

32

449

1

228

structure

67

5

1071

687

0

915

1071

1

259

hidraulics

74

2

233

232

0

756

233

1

290

structure

80

5

16

15

0

45

16

1

321

cab_attach

83

4

722

338

0

205

722

1

353

cab_attach

85

4

469

85

0

594

469

1

384

electrical

86

3

32

31

0

75

32

1

401

engine

89

1

1829

1445

0

725

1829

1

409

cab_attach

89

4

412

28

0

399

412

1

440

cab_attach

90

4

1995

1611

0

42

1995

1

469

electrical

92

3

388

4

0

1744

388

1

495

structure

97

5

548

164

0

95

548

1

520

structure

98

5

95

94

0

27

95

1

554

hidraulics

99

2

1694

1310

0

371

1694

1

575

hidraulics

100

2

786

402

0

1775

786

1

596

engine

102

1

186

185

0

2365

186

1

636

electrical

103

3

351

350

0

49

351

1

659

hidraulics

105

2

2341

1957

0

623

2341

1

688

engine

106

1

2257

1873

0

935

2257

1

4 Modelling


4.1 Which Engines will fail in the Current Period?


4.1.1 Train Model

eXtreme Gradient Boosting: Multiclass

test
Fold 1 started at Thu Jan 16 01:52:10 2020
[0] validation_0-merror:0.149758    validation_1-merror:0.28608 validation_0-cappa:-0.465242    validation_1-cappa:-0.07564
Multiple eval metrics have been passed: 'validation_1-cappa' will be used for early stopping.

Will train until validation_1-cappa hasn't improved in 100 rounds.
[99]    validation_0-merror:0.030843    validation_1-merror:0.329706    validation_0-cappa:-0.9004  validation_1-cappa:-0.008062
Fold 2 started at Thu Jan 16 01:55:31 2020
[0] validation_0-merror:0.153264    validation_1-merror:0.29313 validation_0-cappa:-0.435045    validation_1-cappa:0.019125
Multiple eval metrics have been passed: 'validation_1-cappa' will be used for early stopping.

Will train until validation_1-cappa hasn't improved in 100 rounds.
[99]    validation_0-merror:0.031689    validation_1-merror:0.306221    validation_0-cappa:-0.894728    validation_1-cappa:-0.042114
Fold 3 started at Thu Jan 16 01:58:47 2020
[0] validation_0-merror:0.169499    validation_1-merror:0.304162    validation_0-cappa:-0.348335    validation_1-cappa:0.023783
Multiple eval metrics have been passed: 'validation_1-cappa' will be used for early stopping.

Will train until validation_1-cappa hasn't improved in 100 rounds.
[99]    validation_0-merror:0.033079    validation_1-merror:0.311745    validation_0-cappa:-0.902507    validation_1-cappa:-0.088396
Fold 4 started at Thu Jan 16 02:02:01 2020
[0] validation_0-merror:0.156903    validation_1-merror:0.299635    validation_0-cappa:-0.40984 validation_1-cappa:-0.018407
Multiple eval metrics have been passed: 'validation_1-cappa' will be used for early stopping.

Will train until validation_1-cappa hasn't improved in 100 rounds.
[99]    validation_0-merror:0.031589    validation_1-merror:0.362664    validation_0-cappa:-0.898832    validation_1-cappa:0.0063

CV mean score on validation_0: 0.8991 +/- 0.0028 std.
CV mean score on validation_1: 0.0331 +/- 0.0365 std.
              precision    recall  f1-score   support

           0       0.80      0.83      0.81    278731
           1       0.17      0.11      0.13     45804
           2       0.09      0.11      0.10     28359

    accuracy                           0.68    352894
   macro avg       0.35      0.35      0.35    352894
weighted avg       0.66      0.68      0.67    352894

4.1.2 Prediction on Test Data

fail_id

caex

fail_SS

RUL

LIFETIME

cycle

prob_normal

prob_alarm

44

53

electrical

87

661

574

0.4989948

0.1856343

81

57

engine

157

175

18

0.6032961

0.1793893

129

62

electrical

70

71

1

0.4563932

0.2262334

176

63

engine

92

106

14

0.5240706

0.1895707

200

64

engine

65

449

2

0.4080988

0.2045117

228

67

structure

95

1071

976

0.4868736

0.1947064

259

74

hidraulics

111

233

122

0.6018161

0.1865952

290

80

structure

15

16

1

0.2392261

0.4815701

321

83

cab_attach

96

722

626

0.7187064

0.1353826

353

85

cab_attach

85

469

1

0.3030930

0.2571960

384

86

electrical

31

32

1

0.2534213

0.4327923

401

89

NA

NA

NA

NA

NA

409

89

cab_attach

28

412

1

0.5091531

0.2795297

440

90

cab_attach

94

1995

1901

0.4814275

0.1955547

469

92

electrical

4

388

1

0.2256989

0.4376528

495

97

structure

93

548

455

0.5688560

0.1879514

520

98

structure

94

95

1

0.4075282

0.2251982

554

99

hidraulics

96

1694

1598

0.6927352

0.1343032

575

100

hidraulics

91

786

695

0.5318039

0.1755869

596

102

engine

95

186

91

0.4035516

0.2276198

636

103

electrical

95

351

256

0.5916126

0.1702460

659

105

hidraulics

96

2341

2245

0.7078847

0.1278846

688

106

engine

97

2257

2160

0.7427008

0.1554266

4.1.3 Expected Profit

According to the book Data Science for Business, different classification models could be compared using the Expected Value calculation. This is achieved by constructing a cost-benefit matrix in line with the model confusion matrix, and then converting model performance to a single monetary value by multiplying confusion matrix into the cost-benefit matrix using the formula:

Expected Profit = Probability(+ve) x [TPR x benefit(TP) + FNR x cost(FN)] + Probability(-ve) x [TNR x benefit(TN) + FPR x cost(FP)]

Cost-benefit matrix should be provided by business domain experts. For this project, the following values must be provided:

  • True Positive (TP): engines need maintenance and selected by the model, has benefit of $300K
[1] 15
  • True Negative (TN): engines that are fine and not selected by the model, has benefit of $0K
[1] 7
  • False Positive (FP): engines that are fine but selected by the model, has cost of $-100K

This could be implemented by adding false test data.

  • False Negative (FN): engines need maintenance but not selected by the model, has cost of $-200K
[1] 1
  • Accuracy score:
[1] 0.6521739

4.1.4 Probability density visualization

I tried to search why my probabilities where so small and I crumbled upon this answer:

  • Re 1: If you predict well in the hold out sample then you’re doing well (no time to worry about propriety ;-) But since you’re asking…

One way to look at the threshold is that when you set it to 0.1 you are implicitly specifying a loss function. That is, separating the question of what to do (e.g. approach a customer) from what to infer (e.g. that the probability is of 1 is 0.15). Indeed, you might make this separation a bit more explicit in your question. For example, you talk about needing to approach 5% of some people for something to be worthwhile. And then about how well you can predict cases. Is the issue that to approach the `right’ 5% (presumably the true ’1’s) you might have to approach many more (true ’0’s) to no effect? Then the cost of approach is relevant and the threshold should be set to minimise loss. But you also say you can predict the held out cases well when the threshold is set at 0.1…

  • Re 2: The cause of low probabilities is an unbalanced category distribution. This may cause estimation problems, though don’t automatically assume that it will. If it does you can often correct them quite easily by changing the training data set structure and correcting parameters or in other ways. There’s some discussion here, a link to a good paper, and much more discussion elsewhere in the site - just search for ‘unbalanced sample’.

4.2 Failed sub-sistem classification


4.2.1 Train Model

eXtreme Gradient Boosting: Multiclass

RUL_1
test
Fold 1 started at Thu Jan 16 02:05:31 2020
[0] validation_0-merror:0.229919    validation_1-merror:0.786864    validation_0-cappa:-0.580573    validation_1-cappa:0.567523
Multiple eval metrics have been passed: 'validation_1-cappa' will be used for early stopping.

Will train until validation_1-cappa hasn't improved in 100 rounds.
[99]    validation_0-merror:1.6e-05 validation_1-merror:0.797784    validation_0-cappa:-0.999993    validation_1-cappa:0.3749
Fold 2 started at Thu Jan 16 02:06:48 2020
[0] validation_0-merror:0.223722    validation_1-merror:0.641349    validation_0-cappa:-0.575524    validation_1-cappa:0.476323
Multiple eval metrics have been passed: 'validation_1-cappa' will be used for early stopping.

Will train until validation_1-cappa hasn't improved in 100 rounds.
[99]    validation_0-merror:0.000178    validation_1-merror:0.609965    validation_0-cappa:-0.999667    validation_1-cappa:0.030558
Fold 3 started at Thu Jan 16 02:08:07 2020
[0] validation_0-merror:0.212913    validation_1-merror:0.740678    validation_0-cappa:-0.62386 validation_1-cappa:0.326893
Multiple eval metrics have been passed: 'validation_1-cappa' will be used for early stopping.

Will train until validation_1-cappa hasn't improved in 100 rounds.
[99]    validation_0-merror:1.6e-05 validation_1-merror:0.732832    validation_0-cappa:-0.999993    validation_1-cappa:0.414509
Fold 4 started at Thu Jan 16 02:09:26 2020
[0] validation_0-merror:0.190526    validation_1-merror:0.633036    validation_0-cappa:-0.627567    validation_1-cappa:0.32184
Multiple eval metrics have been passed: 'validation_1-cappa' will be used for early stopping.

Will train until validation_1-cappa hasn't improved in 100 rounds.
[99]    validation_0-merror:6.5e-05 validation_1-merror:0.610364    validation_0-cappa:-0.999839    validation_1-cappa:0.187801
Fold 5 started at Thu Jan 16 02:10:45 2020
[0] validation_0-merror:0.239106    validation_1-merror:0.624575    validation_0-cappa:-0.612178    validation_1-cappa:0.118006
Multiple eval metrics have been passed: 'validation_1-cappa' will be used for early stopping.

Will train until validation_1-cappa hasn't improved in 100 rounds.
[99]    validation_0-merror:9.7e-05 validation_1-merror:0.682657    validation_0-cappa:-0.999846    validation_1-cappa:0.227895
Fold 6 started at Thu Jan 16 02:12:04 2020
[0] validation_0-merror:0.264996    validation_1-merror:0.689694    validation_0-cappa:-0.402547    validation_1-cappa:0.287257
Multiple eval metrics have been passed: 'validation_1-cappa' will be used for early stopping.

Will train until validation_1-cappa hasn't improved in 100 rounds.
[99]    validation_0-merror:9.7e-05 validation_1-merror:0.693739    validation_0-cappa:-0.99981 validation_1-cappa:0.21288

CV mean score on validation_0: 0.9999 +/- 0.0001 std.
CV mean score on validation_1: -0.2414 +/- 0.1266 std.
              precision    recall  f1-score   support

           0       0.32      0.25      0.28     15148
           1       0.14      0.10      0.12      9708
           2       0.29      0.22      0.25     12404
           3       0.39      0.60      0.47     27321
           4       0.16      0.06      0.08      9582

    accuracy                           0.33     74163
   macro avg       0.26      0.24      0.24     74163
weighted avg       0.29      0.33      0.30     74163

4.2.2 Predictgion on Test data

4.2.2.1 Prediction over 12 hours from failure

fail_id

caex

real_SS

real_code

pred_code

pred_SS

44

53

electrical

3

3

electrical

81

57

engine

1

1

engine

129

62

electrical

3

3

electrical

176

63

engine

1

1

engine

200

64

engine

1

1

engine

228

67

structure

5

5

structure

259

74

hidraulics

2

4

cab_attach

321

83

cab_attach

4

4

cab_attach

353

85

cab_attach

4

4

cab_attach

401

89

engine

1

4

cab_attach

440

90

cab_attach

4

4

cab_attach

495

97

structure

5

5

structure

520

98

structure

5

5

structure

554

99

hidraulics

2

2

hidraulics

575

100

hidraulics

2

2

hidraulics

596

102

engine

1

1

engine

636

103

electrical

3

3

electrical

659

105

hidraulics

2

2

hidraulics

688

106

engine

1

1

engine

4.2.2.2 Prediction over 4 hours from failure

fail_id

caex

real_SS

real_code

pred_code

pred_SS

44

53

electrical

3

3

electrical

81

57

engine

1

1

engine

129

62

electrical

3

3

electrical

176

63

engine

1

1

engine

200

64

engine

1

1

engine

228

67

structure

5

5

structure

259

74

hidraulics

2

4

cab_attach

321

83

cab_attach

4

4

cab_attach

353

85

cab_attach

4

4

cab_attach

384

86

electrical

3

3

electrical

401

89

engine

1

4

cab_attach

409

89

cab_attach

4

4

cab_attach

440

90

cab_attach

4

4

cab_attach

495

97

structure

5

5

structure

520

98

structure

5

5

structure

554

99

hidraulics

2

2

hidraulics

575

100

hidraulics

2

2

hidraulics

596

102

engine

1

1

engine

636

103

electrical

3

3

electrical

659

105

hidraulics

2

2

hidraulics

688

106

engine

1

1

engine

4.3 Predicting Engine’s Remaining Usefull Life (RUL)


4.3.1 Train Model

KFold Light GBM

RUL_1
RUL_2
RUL_CLASS
test
{'fold': 0, 'train size': 209048, 'eval size': 69683}
[150]   valid_0's rmse: 83.9531
Fold:0 RMSLE: 69.8348
{'fold': 1, 'train size': 209048, 'eval size': 69683}
[150]   valid_0's rmse: 81.4533
Fold:1 RMSLE: 68.5844
{'fold': 2, 'train size': 209048, 'eval size': 69683}
[150]   valid_0's rmse: 80.0952
Fold:2 RMSLE: 68.5327
{'fold': 3, 'train size': 209049, 'eval size': 69682}
[150]   valid_0's rmse: 80.9218
Fold:3 RMSLE: 68.5133
24
OOF RMSLE: 68.8686

4.3.5 Visualize RUL curve

plot_RUL <- function(idx) {
  # e_pred <- paste0("Failed Sub Sistem: ",
  #                  unique(pull(filter(te_xgb_fail, fail_id == idx), fail_SS)))
  # e_real <- paste0("Fail ID: ",
  #                  unique(pull(filter(te_xgb_fail, fail_id == idx), fail_id)))
  p_RUL <- dt_te %>%
    filter(fail_id == idx) %>%
    mutate(cycle = cycle * 0.25,
           pred_RUL = pred_RUL * 0.25,
           real_RUL = real_RUL * 0.25)
  p_RUL %>%
    e_charts(cycle, width = "100%", height = 400) %>%
    e_line("real_RUL", color = "#194f6e", name = "Real RUL") %>%
    e_line("pred_RUL", color = "#b1254c",
           name = "Predicted RUL") %>%
    e_title(paste("RUL Prediction", "- Fail ID: ",
            unique(pull(filter(te_xgb_fail, fail_id == idx), fail_id))),
            paste("CAEX", unique(pull(filter(te_xgb_fail, fail_id == idx), caex)), 
                  "- Failed Sub Sistem:",
                   unique(pull(filter(te_xgb_fail, fail_id == idx), fail_SS))),
            textStyle=list(fontFamily="arial",
                           fontWeight="bold",
                           color="#4d4d4d"),
            subtextStyle=list(fontFamily="arial", fontSize=16)) %>%
    e_format_x_axis(suffix = "Hours") %>%
    e_format_y_axis(suffix = "Hours") %>%
    e_theme("dark") %>%
    e_legend(bottom = 0, textStyle=list(fontFamily="arial", fontSize=14)) %>%
    e_tooltip(trigger="axis",
              axisPointer = list(
              type = "cross"),
              textStyle=list(fontFamily="arial", fontSize=14)) %>%
    e_mark_point(serie = "Real RUL",
                 data = as.list(transmute(filter(filter(p_RUL, fail_id == idx),
                                                 cycle == max(cycle)),
                                          xAxis = cycle,
                                          yAxis = real_RUL,
                                          value = paste(real_RUL, "hours")))) %>%
    e_mark_point(serie = "Predicted RUL",
                 data = as.list(transmute(filter(filter(p_RUL, fail_id == idx),
                                                 cycle == max(cycle)),
                                          xAxis = cycle,
                                          yAxis = pred_RUL,
                                          value = paste(round(pred_RUL, 0), "hours")))) %>%
    e_toolbox_feature(feature = c("saveAsImage"))}

p_var <- pull(red_light %>% 
       filter(RUL < RUL_THRESHOLD + RUL_THRESHOLD/6 & 
                RUL > RUL_THRESHOLD - RUL_THRESHOLD/6), fail_id)

4.3.5.1 Electrical Failures

4.3.5.2 Engine Failures

4.3.5.3 Structure Failures

4.3.5.4 Hidraulic Failures

4.3.5.5 Cabine and attachments Failures

 




A work by Alonso M., Cecil V.